Skip to content

feat: add GPU temperature monitoring panel to dashboard#5

Open
andrewwhitecdw wants to merge 1 commit into
run-ai:masterfrom
andrewwhitecdw:feat/add-temperature-metrics
Open

feat: add GPU temperature monitoring panel to dashboard#5
andrewwhitecdw wants to merge 1 commit into
run-ai:masterfrom
andrewwhitecdw:feat/add-temperature-metrics

Conversation

@andrewwhitecdw
Copy link
Copy Markdown

Add new "GPU Temperature" row with two time-series panels:

  • GPU Temperature (Avg): Average GPU temperature in Celsius with thresholds
  • GPU Max Temperature: Maximum GPU temperature tracking

Configuration:

  • Metric: DCGM_FI_DEV_GPU_TEMP
  • Unit: Celsius (°C)
  • Warning threshold: 80°C
  • Critical threshold: 90°C
  • Dynamic filtering by GPU type using $gpu_type variable

Add new "GPU Temperature" row with two time-series panels:
- GPU Temperature (Avg): Average GPU temperature in Celsius with thresholds
- GPU Max Temperature: Maximum GPU temperature tracking

Configuration:
- Metric: DCGM_FI_DEV_GPU_TEMP
- Unit: Celsius (°C)
- Warning threshold: 80°C
- Critical threshold: 90°C
- Dynamic filtering by GPU type using $gpu_type variable

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@andrewwhitecdw
Copy link
Copy Markdown
Author

andrewwhitecdw commented Feb 12, 2026

I pushed this for review by accident, I still need to test, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant